LIDS REPORT 2876 1 Weighted Bellman Equations and their Applications in Approximate Dynamic Programming ∗
نویسندگان
چکیده
We consider approximation methods for Markov decision processes in the learning and simulation context. For policy evaluation based on solving approximate versions of a Bellman equation, we propose the use of weighted Bellman mappings. Such mappings comprise weighted sums of one-step and multistep Bellman mappings, where the weights depend on both the step and the state. For projected versions of the associated Bellman equations, we show that their solutions have the same nature and essential approximation properties as the commonly used approximate solutions from TD(λ). The most important feature of our framework is that each state can be associated with a different type of mapping. Compared with the standard TD(λ) framework, this gives a more flexible way to combine multistage costs and state transition probabilities in approximate policy evaluation, and provides alternative means for bias-variance control. With weighted Bellman mappings, there is also greater flexibility to design learning and simulation-based algorithms. We demonstrate this with examples, including new TD-type algorithms with state-dependent λ parameters, as well as block versions of the algorithms. Weighted Bellman mappings can also be applied in approximate policy iteration: we provide several examples, including some new optimistic policy iteration schemes. Another major feature of our framework is that the projection need not be based on a norm, but rather can use a semi-norm. This allows us to establish a close connection between projected equation and aggregation methods, and to develop for the first time multistep aggregation methods, including some of the TD(λ)-type. Oct 2012 ∗Work supported by the Air Force Grant FA9550-10-1-0412. †Lab. for Information and Decision Systems, M.I.T. janey [email protected] ‡Lab. for Information and Decision Systems, M.I.T. [email protected]
منابع مشابه
بهینه یابی مسیر تولید نفت ایران: یک مدل کنترل بهینه برنامه ریزی پویا
In this article we present a dynamic programming model for oil production in Iran. To this end, we represent our model in the form of a Bellman equation in which the function for discounted profit as the objective function is formulated as a Bellman equation, and hence is viewed as a dynamic programming problem.Specifically, in order to show the liquids flow in oil tanks, we use differential eq...
متن کاملA Comparison of Parametric Approximation Techniques to Continuous-Time Stochastic Dynamic Programming Problems
The views and interpretations expressed in these Reports are those of the author(s) and should not be attributed to any organisation associated with the EERH. Abstract 4 I. Introduction 5 II. A generalized stochastic optimal control problem in continuous-time setting 7 III. Parametric approximation approaches to HJB equations 8 IV. Case study 1: Unidimensional standard fishery problem 12 V. Cas...
متن کاملApproximate Dynamic Programming for Ship Course Control
Dynamic programming (DP) is a useful tool for solving many control problems, but for its complexity in computation, traditional DP control algorithms are not satisfactory in fact. So we must look for a new method which not only has the advantages of DP but also is easier in computation. In this paper, approximate dynamic programming (ADP) based controller system has been used to solve a ship he...
متن کاملAn Overview of Research on Adaptive Dynamic Programming
Adaptive dynamic programming (ADP) is a novel approximate optimal control scheme, which has recently become a hot topic in the field of optimal control. As a standard approach in the field of ADP, a function approximation structure is used to approximate the solution of Hamilton-Jacobi-Bellman (HJB) equation. The approximate optimal control policy is obtained by using the offline iteration algo...
متن کاملUniqueness Results for Second-Order Bellman--Isaacs Equations under Quadratic Growth Assumptions and Applications
In this paper, we prove a comparison result between semicontinuous viscosity sub and supersolutions growing at most quadratically of second-order degenerate parabolic Hamilton-Jacobi-Bellman and Isaacs equations. As an application, we characterize the value function of a finite horizon stochastic control problem with unbounded controls as the unique viscosity solution of the corresponding dynam...
متن کامل